segmentation.jpeg

Goal: Customer Segmentation

Customer Segmentation creates information that supports decision making by matching the right customers with the right services and products. With customer segmentation, actions that can address customers' concerns can be taken with greater precision.

In this analysis, K-means and Agglomerative clustering were utilized using the market campaign dataset.

Libraries Used

Importing Libraries

Importing the Dataset

Data Cleaning

We'll perform label encoding operation by changing the labels of unique values because there are many columns.

Let's carry out some feature engineering

Total size of the family

Now we can see what our new dataframe looks like.

Pairplot plots helps us to see the pairwise relationships in a dataset. It creates a grid of Axes such that each variable in the data is shared along the y-axis along a single row and on the x-axis along a single column.

Outlier Detection

Inter Quantile Range

As you can see, there are no more outliers in the data.

This isn't how we want it to be.

Now let's check the income column again.

Correlation

Data preprocessing

PRINCIPAL COMPONENT ANALYSIS

Dimension Reduction

  • For example: improving model performance or enabling visualization.

By performing this operation, we'll be able to look at the data from a different point of view as new attributes can be extracted.

Image of data reduced to 3D

Modelling

KMeans

K-Means starts by randomly choosing the center point of k clusters, and the data points outside the center are included in the clusters they are similar to, according to their distance from the mean values of the clusters. Then, the average value of each cluster is calculated and new cluster centers are determined. Again, the distances of the objects from the center are examined continuosly until it's over.

Plotting the inertia curve using the elbow method

Hierarchical Clustering

Hierarchical clustering typically works by sequentially combining similar clusters.

Let's create a dendogram

Quantities of samples in each cluster.

Cluster profile based on income and spent.

PROFILING